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ASYMPTOTIC EQUIVALENCE FOR INFERENCE ON THE 
VOLATILITY FROM NOISY OBSERVATIONS 

By Markus Reib 
Humboldt- Universitdt zu Berlin 

We consider discrete-time observations of a continuous martin- 
gale under measurement error. This serves as a fundamental model 
for high-frequency data in finance, where an efficient price process 
is observed under microstructure noise. It is shown that this non- 
parametric model is in Le Cam's sense asymptotically equivalent to 
a Gaussian shift experiment in terms of the square root of the volatil- 
ity function a and a nonstandard noise level. As an application, new 
rate-optimal estimators of the volatility function and simple efficient 
estimators of the integrated volatility are constructed. 

1. Introduction. In recent years, volatility estimation from high-frequen- 
cy data has attracted a lot of attention in financial econometrics and statis- 
tics. Due to empirical evidence that the observed transaction prices of as- 
sets cannot follow a discretely sampled semi-martingale model, a promi- 
nent approach is to model the observations as the superposition of the true 
(or efficient) price process with some measurement error, conceived as mi- 
crostructure noise. Main features are already present in the basic model of 
observing 

(1.1) Yi = X i/n + Si, i = l,...,n, 

with an efficient price process Xt = J*q cr(s) dB s , B a standard Brownian 
motion, and £j ~ N(0,5 2 ) all independent. The aim is to perform statistical 
inference on the volatility function a: [0,1] — for example, estimating 
the so-called integrated volatility f Q a 2 (t) dt over the trading day. 

The mathematical foundation on the parametric formulation of this model 
has been laid by Gloter and Jacod (2001a) who prove the interesting re- 
sult that the model is locally asymptotically normal (LAN) as n — > oo, but 
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with the unusual rate n~ 1//4 , while without microstructure noise the rate is 
rT 1 ! 2 . Starting with Zhang, Mykland and Ai't-Sahalia (2005), the nonpara- 
metric model has come into the focus of research. Mainly three different, 
but closely related approaches have been proposed afterwards to estimate 
the integrated volatility: multi-scale estimators [Zhang (2006)], realized ker- 
nels or autocovariances [Barndorff-Nielsen et al. (2008)] and preaveraging 
[Jacod et al. (2009)]. Under various degrees of generality, especially also for 
stochastic volatility, all authors provide central limit theorems with conver- 
gence rate n _1//4 and an asymptotic variance involving the so-called quartic- 
ity Jq 1 <7 4 (i) dt. Recently, also rate-optimal estimators for the spot volatility 
cr 2 (t) have been proposed [Munk and Schmidt-Hieber (2010), Hoffmann, 
Munk and Schmidt-Hieber (2010)]. 

The aim of the present paper is to provide a thorough mathematical un- 
derstanding of the basic model, to explain more profoundly why statistical 
inference is not so canonical and to propose a simple estimator of the inte- 
grated volatility which is efficient. To this end, we employ Le Cam's concept 
of asymptotic equivalence between experiments. In fact, our main theoret- 
ical result in Theorem 6.2 states under the a-H61der-regularity condition 
a > (1 \/5)/4 ~ 0.81 for <r 2 («) that observing (Y{) in (1.1) is for n — > oo 
asymptotically equivalent to observing the Gaussian shift experiment 

dY t = ^/2a{t) dt + 5 1/2 n~ 1/i dW t , te[0,l], 

with Gaussian white noise dW. By the Brown and Low (1996) result, we ob- 
tain a fortiori asymptotic equivalence with the regression model 

Yi = yj2(j(i/y/n) + 5 1/2 ei, i = 1, . . . , Vn, e» ~ N(0, 1) i.i.d. 

Not only the large noise level (5 1//2 n -1 / 4 is apparent, but also a nonlinear 
\J <r(f)-form of the signal, from which optimal asymptotic variance results 
can be derived. Note that a similar form of a Gaussian shift was found to be 
asymptotically equivalent to nonparametric density estimation [Nussbaum 
(1996)]. A key ingredient of our asymptotic equivalence proof are the results 
by Grama and Nussbaum (2002) on asymptotic equivalence for generalized 
nonparametric regression, but also ideas from Carter (2006) and Reifi (2008) 
play a role. Moreover, fine bounds on Hellinger distances for Gaussian mea- 
sures with different covariance operators turn out to be essential. 

Roughly speaking, asymptotic equivalence means that any statistical in- 
ference procedure can be transferred from one experiment to the other such 
that the asymptotic risk remains the same, at least for bounded loss func- 
tions. Technically, two sequences of experiments £ n and G n , defined on pos- 
sibly different sample spaces, but with the same parameter set, are asymp- 
totically equivalent if the Le Cam distance A(£ n , Q n ) tends to zero. For £ j = 
(IPtfWe), * = 1,2, by definition, A(£ 1 ,£ 2 ) = max(5(£i,£ 2 ),5(£ 2 ,£i)) 
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holds in terms of the deficiency S(£i,£2) = infjw sup^ ge ||MP^ — P|||tVj 
where the infimum is taken over all randomisations or Markov kernels M 
from (Xi,F\) to (X^T^)'-, see, for example, Le Cam and Yang (2000) for 
details. In particular, 5{£\,£2) = means that £\ is more informative than 
£ 2 in the sense that any observation in £2 can be obtained from £\ , possibly 
using additional randomizations. Here, we shall always explicitly construct 
the transformations and randomizations and we shall then only use that 
A(£i,<?2) < sup,# 6 ||Pj — P|||tv holds when both experiments are defined 
on the same sample space. 

The asymptotic equivalence is deduced stepwise. In Section 2, the regres- 
sion-type model (1.1) is shown to be asymptotically equivalent to a corres- 
ponding white noise model with signal X. Then in Section 3, a very simple 
construction yields a Gaussian shift model with signal log(cr 2 (») + c), c > 
some constant, which is asymptotically less informative, but only by a con- 
stant factor in the Fisher information. Inspired by this construction, we pre- 
sent a generalization in Section 4 where the information loss can be made ar- 
bitrarily small (but not zero), before applying nonparametric local asymp- 
totic theory in Section 5 to derive asymptotic equivalence with our final 
Gaussian shift model for shrinking local neighborhoods of the parameters. 
Section 6 yields the global result, which is based on an asymptotic sufficiency 
result for simple independent statistics. 

Extensions and restrictions are discussed in Section 7, where we also pre- 
sent a counter-example which shows that asymptotic equivalence fails for Hol- 
der smoothness a < 1/3 of the volatility function cx 2 («). To determine whether 
asymptotic equivalence holds or fails for a E [1/3, (l + \/5)/4] remains a chal- 
lenging open problem. In Section 8, we use the theoretical insight to con- 
struct a rate-optimal estimator of the spot volatility and an efficient estima- 
tor of the integrated volatility by a genuine local-likelihood approach. Re- 
markably, the asymptotic variance is found to depend on the third moment 
Jo a ^(t)dt and for nonconstant cr 2 (») our estimator outperforms previous 
approaches applied to the basic model. Constructions needed for the proof 
are presented and discussed alongside the mathematical results, deferring 
more technical parts to the Appendix, which in Section A.l also contains 
a summary of results on white noise models, the Hellinger distance and 
Hilbert-Schmidt norm estimates. 

2. The regression and white noise model. In the main part, we shall 
work in the white noise setting, which is more intuitive to handle than the 
regression setting, which in turn is the observation model in practice. Let 
us define both models formally. For that, we introduce the Holder ball 

C a (R) :={feC a ([0,l})\\\f\\ c <*<R} 

\m-m\ 



with \\f\\c = H/ ||oo +sup 
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Definition 2.1. Let £ = £ (n,5,a,R,a 2 ) with n £ N, <5 > 0, a £ (0, 1), 
i? > 0, cr 2 > be the statistical experiment generated by observing (1.1). 
The volatility a 2 belongs to the class 



Let £\ = £i(e,a,R,a 2 ) with e > 0, a £ (0, 1), -R > 0, a 2 > be the statis- 
tical experiment generated by observing 



with Xt = Jq o~(s) dB s as above, independent standard Brownian motions W 
and B and a 2 € S(a,R,a 2 ). 

From Brown and Low (1996), it is well known that the white noise and 
the Gaussian regression model are asymptotically equivalent for noise level 
e = 5/ 'y/n — > as n — > oo, provided the signal is /3-H61der continuous for f3 > 
1/2. Since Brownian motion and thus also our underlying process X is only 
Holder continuous of order /3 < 1/2 (whatever a is), it is not clear whether 
asymptotic equivalence can hold for the experiments £q and S\. Yet, this 
is true. Subsequently, we employ the notation A n < B n if A n = 0(B n ) and 
A n ~ B n if A n < B n as well as B n < A n and obtain the following theorem. 

Theorem 2.2. For any a > 0, a 2 > and 5, R > the experiments £q 
and S\ with e = 5/y/n are asymptotically equivalent; more precisely, 



Interestingly, the asymptotic equivalence holds for any positive Holder 
regularity a > 0. In particular, for this result the volatility a 2 could be itself 
a continuous semi-martingale, but such that X conditionally on a 2 remains 
Gaussian. Let us also recall that by inclusion asymptotic equivalence always 
holds for subclasses of functions, here for example for C m -balls of m-times 
continuously differentiable functions a 2 so that we write a > 0, meaning 
arbitrarily small positive a, and not a G (0,1], which is more formal, but 
misleading. As the proof in Section A. 2 of the Appendix reveals, we construct 
the equivalence by rate-optimal approximations of the anti-derivative of a 2 
which lies in C 1+a . Similar techniques have been used by Carter (2006) and 
Reifi (2008), but here we have to cope with the random signal for which 
we need to bound the Hilbert-Schmidt norm of the respective covariance 
operators. Note further that the asymptotic equivalence even holds when 
the noise level 5 tends to zero, provided 5 2 n a — > oo remains valid. 




dY t = X t dt + edW t 



t€[0,l] 



A{£ (n, 5, a, R,? 2 ),^/^, a, R,a 2 )) < RS' 2 ^ 



a 
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3. Less informative Gaussian shift experiments. From now on, we shall 
work with the white noise observation experiment £\, where the main struc- 
tures are more clearly visible. In this section, we shall find easy Gaussian 
shift models which are asymptotically not more informative than £\, but 
already permit rate-optimal estimation results. The whole idea is easy to 
grasp once we can replace the volatility a 2 by a piecewise constant approx- 
imation on small blocks of size h. That this is no loss of generality is shown 
by the subsequent asymptotic equivalence result, proved in Section A. 3 of 
the Appendix. 

Definition 3.1. Let £2 = £2(2, h, a, R,a 2 ) be the statistical experiment 
generated by observing 

dY t = X?dt + edW t , t £ [0, 1], 

with Xj} = J*a([s\ h )dB s , [s\ h := [s/h\h for h > and h^ 1 £ N, and inde- 
pendent standard Brownian motions W and B. The volatility a 2 belongs to 
the class S(a, R,a 2 ). 

Proposition 3.2. Assume a £ (1/2, 1] and a 2 > 0. Then for e — > 0, 
h a = o(e 1 / 2 ) the experiments £\ and £2 are asymptotically equivalent; more 
precisely, 

A(£i(e, a, R,a 2 ),£ 2 (e, h, a, R,a 2 )) < RqT 3/2 h a e- 1/2 . 

In the sequel, we always assume h a = o(e 1 ^ 2 ) to hold such that we can 
work equivalently with £2. Recall that observing Y in a white noise model 
is equivalent to observing (J e m dY) m >i for an orthonormal basis (e m ) m >i 
of L 2 ([0,1]); cf. also Section A.l below. Our first step is thus to find an 
orthonormal system (not a basis) which extracts as much local information 
on o~ 2 as possible. For any 93 G L 2 ([0, 1]) with ||<£>||l2 = 1, we have by partial 
integration 

1 n\ rX 

<p(t) dY t = / <p(t) dt + e <p(t) dW t 
Jo Jo 

(3.1) = - $(0)A fe - j\(t)a([t\ h )dB t + e J ip{t)dW t 

' rX \l/2 

<S> 2 {t)a 2 {[t\ h )dt + e 2 \ ( v , 

where <&(i) = — J. 1 tp(s) ds is the antiderivative of (p with $(1) = and ~ 
iV(0, 1) holds. To ensure that $ has only support in some interval [kh, (k + 
l)h], we require (p to have support in [kh, (k + l)h] and to satisfy J* (p(t) dt = 
0. The function Lp k with supp(</?fc) = [kh, (k + l)h], \\fk\\L 2 = 1> / ^fc(*) dt = 
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that maximizes the information load f Q k (t)dt for o~ 2 (kh) is given by (use 
Lagrange theory) 

(3.2) <p k (t) = V2h- 1 / 2 co S (n(t-kh)/h)l [khXk+m (t), te[0,l]. 

The L 2 -orthonormal system (ip k ) for k = 0, 1, . . . , h^ 1 — 1 is now used to 
construct Gaussian shift observations. In £2, we obtain from (3.1) the ob- 
servations 

(3.3) y k := J tp k (t) dY t = (tfir^a^kh) + e 2 ) l/2 Q k , k = 0,..., h~ x - 1, 

with independent standard normal random variables (Cfc)fc=o ... h- 1 -!- Ob- 
serving (y k ) is equivalent to observing 

(3.4) z k := log(^-V) - E[log(C, 2 )] = log(a 2 (kh) + e 2 /i" V) + Vk 

for k = 0, ... , h~ 1 — 1 with r) k := log(C|) — E[log(Cfc)] since (y k ) is a sufficient 
statistic in (3.3) and the logarithm is one-to-one. 

We have found a nonparametric regression model with regression function 
log(cr 2 (») + e 2 h~ 2 TT 2 ) and h~ l equidistant observations corrupted by non- 
Gaussian, but centered noise (r] k ) of variance 2. To ensure that the regression 
function does not change under the asymptotics e — > 0, we specify the block 
size h = h(e) = hoe with some fixed constant ho > 0. 

It is not surprising that the nonparametric regression experiment in (3.4) 
is equivalent to a corresponding Gaussian shift experiment. Indeed, this 
follows readily from results by Grama and Nussbaum (2002) who in their 
Section 4.2 derive asymptotic equivalence already for our Gaussian scale 
model (3.3). Note, however, that their Fisher information for $ = a 2 must 
be corrected to /(#) = \'&~ 2 - We then obtain directly asymptotic equivalence 
of (3.3) with the Gaussian regression model 

w k = ^ log(a 2 (kh) + h^n 2 ) + 7fc , k = 0, . . . , h~ l - 1, 

where 7^ ~ iV(0, 1) i.i.d. Since by the classical result of Brown and Low 
(1996) or by Reifi (2008) the Gaussian regression is equivalent to the cor- 
responding white noise experiment [note that log(cr 2 («) + /iq 2 7r 2 ) is also a- 
Hdlder continuous], we have already derived an important and far-reaching 
result. 

Theorem 3.3. For a > 1/2 and cr 2 > the high-frequency experiment 
£\{e, a, R, a 2 ) is asymptotically more informative than the Gaussian shift 
experiment Gi(e, a, R,a 2 , ho) of observing 

dZ t = -^log(o- 2 (t) + h - 2 7r 2 )dt + h 1 /2 e 1/2 dW t , t€ [0,1]. 
v2 

Here ho > is an arbitrary constant and a 2 E 5(a,i?,a 2 ). 
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Remark 3.4. Moving the constants from the diffusion to the drift part, 
the experiment Q\ is equivalent to observing 

(3.5) dZ t = (2h )- 1/2 log{a 2 {t) + h^ 2 TT 2 )dt + e 1/2 dW t , *€[0,1]. 

Writing e = 5/y/n gives us the noise level ^^n -1 / 4 which appears in all pre- 
vious work on the model £q. 

To quantify the amount of information we have lost, let us study the LAN- 
property of the constant parametric case cr 2 (t) = a 2 > in Q\. We consider 
the local alternatives a 2 = <Tq + e 1 / 2 for which we obtain the Fisher informa- 
tion i/j = (2/io)~ 1 /i 4 / (vr 2 + /iqiTq) 2 . Maximizing over ho yields ho = VSna^ 1 
and the Fisher information is at most equal to sup^ 0>0 Ih — cr^~ 3 3 3 / 2 /(327r) ~ 
0.05170-q 3 . 

By the LAN-result of Gloter and Jacod (2001a) for £q, the best value is 
(°o) = l^o" 3 which is clearly larger. Note, however, that the relative (nor- 
malized) efficiency is already " v ^ 3 ^ / ^^ 0.64, which means that we attain 
here about 64% of the precision when working with Q\ instead of So or E\. 

4. A close sequence of simple models. In order to decrease the infor- 
mation loss in Qi, we now take into account higher frequencies in each 
block [kh, (k + l)h] by using further trigonometric basis functions. In the 
case of constant a 2 , the covariance operator of the observations is diagonal- 
ized by the Karhunen-Loeve basis for Brownian motion which together with 
a blockwise approximation is exactly the idea here; see also the discussion in 
Section 7. Equivalently, we can argue by a variational principle, maximizing 
the information load as in the case of ip k . In a frequency- location notation 
(j, k), we consider for k = 0, 1, . . . , h~ l — 1, j > 1, 

(4.1) ip jk (t) = V2h^/ 2 cos(jTr(t-kh)/h)l [khAk+1)h] (t), te[0,l]. 
This gives the corresponding antiderivatives 



V 2/z 

®jk{t) = -—sm(jTt{t-kh)/h)l [kK{k+1)h] {t), te [0,1]. 

Not only the {(fj k ) and ($jk) are localized on each block, also each single fa- 
mily of functions is orthogonal in L 2 ([0, 1]). Working again on the piecewise 
constant experiment £2, we extract the observations 

Vjk ■= [ <p jk (t) dY t = (h 2 7r~ 2 r 2 a 2 (kh) + e 2 ) 1/2 ( jk , 
(4-2) h 

j>l,k = 0,...,h- 1 -l, 
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with Qjk ~ iV(0, 1) independent over all (j,k). Note that independence fol- 
lows since (tfjk) and are both L 2 -orthogonal families and the obser- 
vations are therefore uncorrelated. The same transformation as before leads 
for each j > 1 to the regression model for k = 0, . . . , h~ l — 1 

z jk := log(y? fc ) - log^Tr-^- 2 ) - E[log(C| fc )] 

(4.3) 

= \og{<j 2 {t)+e 2 h~ 2 ii 2 j 2 ) + i ljk . 

Applying the asymptotic equivalence result by Grama and Nussbaum (2002) 
for each independent level j separately, we immediately generalize Theo- 
rem 3.3. 



Theorem 4.1. For a > 1/2 and cr 2 > 0, the high-frequency experiment 
£i(e, a, R, a 2 ) is asymptotically more informative than the combined experi- 
ment Q2{e,a,R, a 2 , ho, J) of independent Gaussian shifts 

dZ'i = — \og{a 2 {t) + 2 TT 2 j 2 ) dt + hl /2 e 1/2 dW 3 t , 

te [o,i],j = i,...,J, 

with independent Brownian motions (W^ J )j=i and a 2 *E S(a, R,a 2 ) . The 
constants ho > and J G N are arbitrary, but fixed. 



Remark 4.2. Let us again study the LAN-property of the constant pa- 
rametric case cr 2 (t) = a 2 > for the local alternatives a 2 = o~\ + e 1 ^ 2 . We 
obtain the Fisher information 

In the limit J — > oo and ho — > oo, we obtain by Riemann sum approximation 

dx 1 
2(ir 2 x 2 + a 2 ) 2 = 8of" 

This is exactly the optimal Fisher information, obtained by Gloter and Jacod 
(2001a) in this case. Note, however, that it is not at all obvious that we 
may let J, ho — > oo, in the asymptotic equivalence result. Moreover, in our 
theory the restriction h a = o(e 1//2 ) is necessary, which translates into ho = 
o(e^ 1 ~ 2a ^ 2a ). Still, the positive aspect is that we can come as close as we 
wish to an asymptotically almost equivalent, but much simpler model. The 
convergence ho — > oo is also an essential point in the final proof, starting 
with the next section. 



lim lim I ho J 

ho— >oo J— >oo 
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5. Localization. We know from standard regression theory [Stone (1982)] 
that in the experiment Q\ we can estimate a 2 E C a in sup-norm with rate 
(£log(e~ 1 )) a /( 2a+1 \ using that the log-function is a C°°-diffeomorphism for 
arguments bounded away from zero and infinity. Since E\ is for a > 1/2 
asymptotically more informative than Q\, we can therefore localize a 2 in 
a neighborhood of some <7q. Using the local coordinate s 2 in a 2 = <Tq + v e s 2 
for v £ — > 0, we define a localized experiment; cf. Nussbaum (1996). 

Definition 5.1. Let £j iloc = £i i oc (a ,£,a,R,a 2 ) for cr &S(a,R,a 2 ) be 
the statistical subexperiment obtained from £j(e, a, i?, cr 2 ) by restricting to 
the parameters a 2 = a\ + v e s 2 with v e = e a /( 2a+1 ) log(e~ 1 ) and unknown 
s 2 €C a (R). 

We shall consider the observations (yjk) in (4.2) derived from £2,100 an d 
multiplied by irj/h. The model is then a generalized nonparametric regres- 
sion family in the sense of Grama and Nussbaum (2002). On the sequence 
space {X,T) = (M N ,<B 0N ), we consider for ■& E G = [a 2 ,R] the Gaussian 
product measure 

(5.1) P* = (g)7V(0,?? + / lo -Vj 2 ). 

The parameter $ plays the role of a 2 (kh) for each k. By independence and 
the result for the one-dimensional Gaussian scale model, the Fisher infor- 
mation for ■& is given by 

^(tf + ^Vj 2 ) 2 

(5.2) 

/to / 1 + W 1 / 2 h e-™ 1/2h ° - e-^ 1/2h ° 2 \ 
~8#/ 2 V (i_ e -wV2ho) 2 d 1 / 2 ^)' 

where the series is evaluated using the derivative with respect to a in the 
identity Y^=i ji+o? = 2^( 7rac °th( 7ra ) ~~ !)■ Since we shall later let ho tend 
to infinity, an essential point is the asymptotics /($) ~ ho- 

We split our observation design \kh \ k — 0, . . . , h ^} into blocks A m — 
{kh I k = (m — . . . ,m£ — 1}, m = 1,.. . , (^/i)" 1 , of length I such that the 
radius v e of our nonparametric local neighborhood has the order of the 
parametric noise level (/(i?)^ -1 / 2 in each block: 

(5.3) v £ ~ (J(0)*)-V2 => ^-/i" 1 !;- 2 . 

For later convenience, we consider odd and even indices fe separately, 
assuming that /i" 1 and t are even integers. This way, for each block m ob- 
serving (yjk^j/h) for j > 1 and fe E A m , fe odd, respectively, fe even, can be 
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modeled by the experiments 

V V fe6A m odd /s 2 6C«(,R) 

( 5 - 5 ) ^3^" = ( X*' 2 ,?®^ 2 , ( (^) IP CT 2(fc/n)+^s 2 (fc/n) ) )i 

V V fc 6 A m even /s2eC-(i?)/ 

where all parameters are the same as for £2,100- Using the nonparametric 
local asymptotic theory developed by Grama and Nussbaum (2002) and the 
independence of the experiments (£3^)™ [resp., {£^m)m], we are able to 
prove in Section A. 4 the following asymptotic equivalence. 

Proposition 5.2. Assume a > 1/2, a 2 > and h ~ e~ p withp £ (0, 1 — 
(2a) _1 ) smc/i (2/i) _1 € N. T/ien observing {yj.2k+i \ j > = 0, 
(2/i) _1 — 1} m experiment £2,100 is asymptotically equivalent to the local Gaus- 
sian shift experiment Gs,\ oc of observing 

1 /2 

dY t = - 3 ( 1 - —^7-) Ue s 2 (f)dt + (2e) 1 / 2 dW t , 
\/8cTn /2 (t) V *o(*) V 

(5.6) 

i€[0, 1], 

where the unknown s 2 and all parameters are the same as in £2,100- The Le 
Cam distance tends to zero uniformly over the center of localization o~q £ 
S(a,R,a 2 ). 

The same asymptotic equivalence result holds true for observing {yj,2k \ 
j > 1, k = 0, . . . , {2h)~ l — 1} in experiment <?2.ioc- 

Note that in this model, combining even and odd indices k, we can already 
infer the LAN-result by Gloter and Jacod (2001a), but we still face a second- 
order term of order 1 v £ in the drift. This term is asymptotically negligible 
only if it is of smaller order than the noise level e 1 / 2 . To be able to choose 
ho sufficiently large, we have to require a larger Holder smoothness of the 
volatility. 

Corollary 5.3. Assume a > ~ 0.64, a 2 > and h ~ e~ p with 

p £ (0, 1 — (2a) -1 ) such that (2h)~ l £ N. Then observing {yj,2k+i \ j > l,k = 
0, . . . , (2/i) -1 — 1} in experiment £2,100 is asymptotically equivalent to the local 
Gaussian shift experiment Qn oc of observing 

(5.7) dY t = 3 v £ s 2 (t)dt + {2e) l l 2 dW t , ££[0,1], 
V8a Q ' (t) 
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where the unknown s 2 and all parameters are the same as in £2,100- The Le 
Cam distance tends to zero uniformly over the center of localization erg £ 
S(a,R,a 2 ). 

The same asymptotic equivalence result holds true for observing {yj,2k \ 
j > 1, k = 0, . . . , {2h)~ l — 1} in experiment <?2.ioc- 

Proof. For a > 1+ g , the choice of ho = e~ p for some p G (^W, ^jr) 

is possible and ensures that h a = o(e 1 ^ 2 ) holds as well as h^ 2 = o{v £ 2 e). 
Therefore, the Kullback-Leibler divergence between the observations in Q^ c 
and in C/ 4 oc evaluates by the Cameron-Martin (or Girsanov) formula to 

e- 1 C -^r-A (l-—^\' 2 -l\v 2 £ s\t)dt<e- 1 h, 2 v 2 £ . 

Consequently, the Kullback-Leibler and thus also the total variation distance 
tend to zero. □ 

In a last step, we find local experiments Q^\ OC i which are asymptotically 
equivalent to Q^\ oc and do not depend on the center of localization <7q. 
To this end, we use a variance-stabilizing transform, based on the Taylor 
expansion 

V2X 1 ' 4 = V2xl /A + ^x" 3/4 (x - s„) + 0{(x - xo) 2 ) 

which holds uniformly over x,xq on any compact subset of (0,oo). Inserting 
x = o~ 2 (t) = a"o(i) + v £ s 2 (t) and so = <Tq from our local model, we obtain 

(5.8) VMt) = V2oW) + ^ 3/2 (t)v £ s 2 (t) + 0(v 2 ). 

Since v £ = o(e 1 ^ 2 ) holds for a > 1/2, we can add the uninformative sig- 
nal \^2a^ 2 (t) to Y in G^ioo replace the drift by y/^a 1 / 2 ^) and still keep 
convergence of the total variation distance, compare the preceding proof. 
Consequently, from Corollary 5.3 we obtain the following result. 

Corollary 5.4. Assume a > ~ 0.64, a 2 > and h ~ e~ v with 

p € (0, 1 — (2a) -1 ) such that {2h)~ l £ N. Then observing {y, 2fe+i | j > 1>& = 
0, . . . , (2/i) _1 — 1} in the experiment £2,i oc is asymptotically equivalent to the 
local Gaussian shift experiment G5 t \ oc of observing 

(5.9) dY t = ^2a{t) dt + (2e) 1/2 dW t , t€[0,l], 

where the unknown is a 2 = <7q + v £ s 2 and all parameters are the same as in 
£2100- The Le Cam distance tends to zero uniformly over the center of locali- 
zation erg &S(a,R,a 2 ). 

The same asymptotic equivalence result holds true for observing {yj^2k I 
j > 1, k = 0, . . . , (2/i) -1 — 1} in experiment <?2.ioc- 
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6. Globalization. The globalization now basically follows the usual route, 
first established by Nussbaum (1996). Essential for us is to show that observ- 
ing (i/jk) for j > 1 is asymptotically sufficient in £2. Then we can split the 
white noise observation experiment £2 into two independent sub-experiments 
obtained from (yjk) for k odd and k even, respectively. Usually, a white noise 
experiment can be split into two independent subexperiments with the same 
drift and an increase by \pl in the noise level. Here, however, this does not 
work since the two diffusions in the random drift remain the same and thus 
independence fails. 

Let us introduce the L 2 -normalized step functions 

Po,fc(*) := ( 2/l )~ 1/2 ( 1 [(fc-i)h,feh](*) - 1 [jfc/i,(jb+i)/i](*)) ) k = l,..., h~ l — 1, 
W),o(*) :=/*- 1/2 l[o,h](t)- 

We obtain a normalized complete basis {<Pjk)j>o,o<k<h- 1 -i °f L 2 ([Q, 1]) such 
that observing Y in experiment £2 is equivalent to observing 

yjk'-= <Pjk(t)dY t , j>o,k = o,...,hr 1 -1. 

Jo 

Calculating the Fourier series, we can express the tent function $o,fc with 
<3?' k = ipo t f. and $o,fc(l) = as an L 2 -convergent series over the dilated sine 
functions <&jk and j > 1: 

(6.1) *o,fc(<)=X)(- 1 > 7 ' +1 *i.fc-i(*) + Z)*3*( t )' fc = l,---^ _1 -l- 

j>i i>i 

We also have <3>o,o(£) = 2^j>i^i,o(^)- By partial integration, this implies 
(with L 2 -convergence) 

/3 ,fe := <¥>o,fc,*> = - / *o,k(t) = £(-l) i+1 /8i,fc-l + 

• /o i>i j>i 

where /3 jfc := (tpjk,X) 

for > 1 and similarly /3o,o = 2^j>i $7,0- This means that the signal /3 ,fc in 
yo,fe can be perfectly reconstructed from the signals in the Uj,k-i, Vjk- For 
jointly Gaussian random variables, we obtain the conditional law in £2 

£{Pjk\yjk) = N [ y jk - 



y&x{y jk ) ' Var(y 3 -fe) 

which depends on the unknown a 2 (kh). Given the results by Stone (1982) 
and our less-informative Gaussian shift experiment Q\ for a > 1/2, <r 2 > 0, 
there is an estimator a 2 based on (yi :k )k in £2 with 

(6.2) lim inf P CT 2 e (||<7 2 - a 2 ^ < Rv £ ) = 1, 

£-S>0cr 2 6cS 
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where v e = e a /( 2Q+1 ) log(e~ 1 ) as in the definitions of the localized experi- 
ments. 

In a randomization step, we can thus generate independent N(0, ^-distri- 
buted random variables pj k to construct from (yjk)j>i,k 

~ Var £ (/3 ifc ) gVar.^-fc) 1 / 2 

:= ^Am) Vlk + Var e(% . fc )V 2 ^ ' * ^ 

where the variance Var £ is the expression for Var where the unknown values 
a 2 (kh) are replaced by the estimated values a 2 (kh): 

(6.3) Var £ (y ifc ) = Var e (/3 ifc ) + e 2 , Vsr e (p jk ) = h 2 ^~ 2 r 2 a 2 £ (kh). 

From this, we define {3 0ik := J2j>i((- l ) j+1 Pj,k-i + Pjk), A),0 '■= 2 l>2j>i Pjfl 
and generate artificial observations (jjok) such that the conditional law 
£((yo,k)k\(yjk)j>i,k) corresponds to £((yo,k)k\(yjk)j>i,k) in the sense that 
it is multivariate normal with mean (Pok)k and (tri-diagonal) covariance 
matrix e 2 ((ip ,k, <Po,k>))k,k>- 

In Section A. 5, we shall prove that the Hellinger distance between the fam- 
ilies of centered Gaussian random variables y := {yjk \ j > 0, k = 0, . . . , h^ 1 — 
1} and y := {y ,k I k = 0, . . . , h^ 1 - 1} U {y jk \j>l,k = 0,.. .,h~ l — 1} tends 
to zero, provided hy l v 2 = o(e), which is possible when a > with the 

choice ho = e~ p for some p E ( 20+1 > 2 2a 1 )• ^ n particular, this means that 
(yjk)j>i,k is asymptotically sufficient and the information in (j/o,fc)fc is asymp- 
totically negligible. 

Proposition 6.1. Assume a > 1+ 4 V ^ m 0.81, cr 2 > and ft," 1 an ewen 
integer. Then the experiment £2 is asymptotically equivalent to the prod- 
uct experiment £2,odd ® ^2,even where £2,odd ^ s obtained from the observa- 
tions {yj^k+i I j > 1) k = 0, • • • , (2/i) _1 — 1} and £2, even from the observations 
{Vj,2k I J ' > 1) = 0, . . . , (2/i) _1 — 1} in experiment £%. 

This key result permits to globalize the local result. In the sequel, we 
always assume a > 1+ 4 V ^ and a 2 > 0. We start with the asymptotic equiva- 
lence between E% and £2,odd ® ^2,even- Using again an estimator <r 2 in £2,odd 
satisfying (6.2), we can localize the second factor £2, even around a 2 and 
therefore by Corollary 5.4 replace it by experiment (/5,10c; see Theorem 3.2 
in Nussbaum (1996) for a formal proof. Since ^5,i oc does not depend on the 
center a 2 , we conclude that £2 is asymptotically equivalent to the product 
experiment <?2,odd ® @5 where Q5 has the same parameters as £2 and is given 
by observing Y in (5.9). Now we use an estimator a 2 in C/5 satisfying (6.2), 
whose existence is ensured by Stone (1982), to localize <?2,odd- Corollary 
5.4 then allows again to replace the localized £2 dd-experiment by G§ such 
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that £2 is asymptotically equivalent to the product experiment Q^,®Q^. Fi- 
nally, taking the mean of the independent observations (5.9) in both factors, 
which is a sufficient statistic (or, abstractly, due to identical likelihood pro- 
cesses) we see that <8> G5 is equivalent to the experiment Qq of observing 
dY t = y / 2a(t) dt + </edWt, t £ [0, 1]. Our final result then follows from the 
asymptotic equivalence between £q and £\ as well as between £\ and £2- 

Theorem 6.2. Assume a > « 0.81 and 5 n , a 2 , R>0. Then the 
regression experiment £o(n, 6 n , a, R, a 2 ) is for n — > 00 and 5~ 2 n~ a — > asymp- 
totically equivalent to the Gaussian shift experiment Qq^u' 1 ^ 2 , a, R, a 2 ) of 
observing 

(6.4) dY t = ^/2a{t) dt + 5 1/2 n~ 1/i dW t , te[0,l], 

for a 2 £S(a,R,a 2 ). 

7. Discussion. Our results show that inference for the volatility in the 
high-frequency observation model under microstructure noise £q is asymp- 
totically as difficult as in the well-understood Gaussian shift model Qq. Re- 
mark that the constructions in Gloter and Jacod (2001a, 2001b) rely on 
preliminary estimators at the boundary of suitable blocks, while we re- 
quire supp^jfc = [kh,(k + l)h] to obtain independence among blocks. In 
this context, Proposition 6.1 shows asymptotic sufficiency of observing only 
the increment process X t — Xkh, t S [kh,(k + l)h], on each block due to 
J tpjk(t) dt = for j > 1. Naturally, the (pjk)j>i form exactly the eigenfunc- 
tions of the covariance operator of Brownian motion on [kh, (k + l)h] and it 
suffices to use the block-wise Karhunen— Loeve expansion for inference. 

It should be remarked that a fortiori asymptotic equivalence also holds 
when using instead of the (<pjk) different basis functions on each block 
spanning the orthogonal complement of the constant functions (i.e., inte- 
grating to zero). For practical applications, especially when estimating the 
spot volatility curve, the blocking might produce artifacts and wavelet bases 
which realize a well localized time frequency analysis seem to be well suited, 
compare Hoffmann, Munk and Schmidt-Hieber (2010). 

It is interesting to note that both, model £q and model Qq, are homoge- 
neous in the sense that factors from the noise (i.e., the dWf-term) can be 
moved to the drift term and vice versa such that, for example, high volatil- 
ity can counterbalance a high noise level 5 or a large observation distance 
1/n. Another phenomenon is that observing £q m-times independently with 
n observations each (i.e., with m different realizations of the process X) is 
asymptotically as informative as observing £q with m 2 n observations (i.e., 
with one realization of the process X): both experiments are asymptotically 
equivalent to dYt = v / 2c(t) dt + m l l 2 5 l l 2 n~ l l 4: dWt- Similarly, by rescaling 
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we can treat observations on intervals [0,T] with T > fixed: observing 
Yi = X iT / n + £j, i = l,...,n, in £q with X t = J Q a(s) dB s , t 6 [0, T], is under 
the same conditions asymptotically equivalent to observing 

dY u = y/2a{Tu) du + 5 1/2 T^ 1/4 n~ 1/4 dW u , u£ [0,1], 

or equivalently, 

dY v = y/2a(v) du + <5 1/2 (T/n) 1/4 dW v , v€[0,T\. 

Concerning the various restrictions on the smoothness a of the volatility a 2 , 
one might wonder whether the critical index is a = 1/2 in view of the classical 
asymptotic equivalence results [Brown and Low (1996), Nussbaum (1996)]. 
In our approach, we still face the second-order term in (5.6) and using the 
localized results, a much easier globalization yields for a > 1/2 only that £q 
is asymptotically not less informative than observing 

dY t = F{a 2 {t))dt + 5 l ' 2 n~ l l i dW t , te [0,1], 

with F(x) = Jiiy 1 ^ 2 — 2/iq 1 ) 1 / 2 y~ 1 dy /y/8, which includes a small, but non- 
negligible second-order term since ho cannot tend to infinity too quickly. 

On the other hand, a simple construction shows that for a < 1/3 asympto- 
tic equivalence fails. In the regression model, £q with n observations, we can- 
not distinguish between X n (t) = Jq a n (t) dB t with a 2 (t) = 1 + n~ a cos(7rnt), 
|| a 2 1| C" = 2 + n _a , and standard Brownian motion (a 2 = 1) since X n (i/n) — 
X n ((i — l)/n) ~ iV(0, 1/n) i.i.d. holds. Here, we choose the noise level 5 n = 
n l/2-2a sucn ^ e requirement 5~ 2 n~ a — > in Theorem 6.2 holds due to 
a < 1/3. 

Yet, we obtain J Q ( y / 2a n (t) - ^2) 2 dt ~ n , which shows that the sig- 
nal to noise ratio in the Gaussian shift model Go with diffusion coefficient 
(5y 2 n -1 / 4 is of order n -2a /(<5 n n -1 / 2 ) = 1 and a Neyman-Pearson test be- 
tween o" 2 and 1 can distinguish both signals with a positive probability. 
This different behavior for testing in £q and Go implies that both models 
cannot be asymptotically equivalent for a < 1/3. Note that Gloter and Jacod 
(2001a) merely require a > 1/4 for their LAN-result, but our counterexam- 
ple is excluded by their parametric setting. In conclusion, the behavior in 
the zone a G [1/3, (1 + \/5)/4] remains unexplored. If we restrict to con- 
stant noise level 5 in the regression model £q, then the same argument gives 
a counterexample for regularity a < 1/4. 

8. Applications. Let us first consider the nonparametric problem of esti- 
mating the spot volatility cr 2 (t). From our asymptotic equivalence result 
in Theorem 6.2 we can deduce, at least for bounded loss functions, the 
usual nonparametric minimax rates, but with the number n of observa- 
tions replaced by y/n provided a 2 £ C a for a > (1 + v5)/4 as the mapping 
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\/cr(t) i — y cr 2 (t) is a C°°-diffeomorphism for volatilities a 2 bounded away 
from zero. Since the results so far obtained only deal with rate results, it 
is even simpler to use our less informative model Q\ or more concretely the 
observations (yk) in (3.3) which are independent in £2, centered and of vari- 
ance h 2 ir~ 2 a 2 (kh) + e 2 . With h = e, a local (kernel or wavelet) averaging over 
e~ 2r K 2 y\ — it 2 therefore yields rate-optimal estimators for classical pointwise 
or L^- type loss functions. 

For later use, we choose h = e in £2 and propose the simple estimator 

(8.1) a ^ : =Yb £ ^"W-* 3 ) 

fc:|te-i|<6 

for some bandwidth b > 0. Since ( k is x 2 (l)-distributed, it is standard [Stone 
(1982)] to show that with the choice b ~ (elog(e~ 1 )) 1 /( 2a+1 ) we have the sup- 
norm risk bound 

especially we shall need that a 2 is consistent in sup-norm loss. 

In terms of the regression experiment £q, we work (in an asymptotically 
equivalent way) with the linear interpolation Y' of the observations (Yi); see 
the proof of Theorem 2.2. By partial integration, we can thus take for any 
j,k 

rl n / ri/n \ 

(8.2) y% := - / $ jk {t)Y"(t) dt = V -n / $ jfc (t) dt (Y* - Y^), 

JO ~l V J{i-\)/n J 

setting Yq := 0. Interpreting the integral terms as weights, the y® k are just 
local averages over the increments as in the pre-averaging approach. Podol- 
skij and Vetter (2009) use Haar functions as (they were aware of the fact 
that discretized sine functions would slightly increase the Fisher informa- 
tion), but they have not used higher frequencies j. 

Since we use the concrete coupling by linear interpolation to define y® k 
in £q and since convergence in total variation is stronger than weak con- 
vergence, all asymptotics for probabilities and weak convergence results for 
functionals F((yjk)jk) in £2 remain true for F{(y® k )jk) in £q, uniformly over 
the parameter class. The formal argument for the latter is that whenever 
||P n — QhIItv ~~ * and Pjf™ — > F x weakly for some random variables X n we 
have for all bounded and continuous g 

EQ Tl [ 5 (X n )]=E P J 5 (X n )] + 0(|| 5 || 0O ||P n -Q n || TV )^^Ep[ 5 (X)]. 
Thus, for a > 1/2, cr 2 > and 6~ (n -1 / 2 logn) _1 /( 2a+1 ) the estimator 

(8-3) d nV--=4fr ^ (nrVW-vr 2 ) 

* k:\kn~ 1 1 12 -t\<b 
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satisfies in the regression experiment £q 

(8.4) lim inf P CT 2 Jn a ^ 4a+2) (logn)" 1 \\a 2 n - cr^U < R) = 1. 

The asymptotic equivalence can be applied to construct estimators for the 
integrated volatility J^a 2 (t) dt or more generally pth. order integrals f a p (t) dt 
using the approach developed by Ibragimov and Khas'minskii (1991) for 
white noise models like Go- In our notation, their Theorem 7.1 yields an 
estimator $ Pjn of J l a p {t)dt in Go such that 



P ,n- Ca p {t)dt-5 l/2 n- l l A V2p ^ a p ^ l/2 {t) dW t \ 
Jo Jo 



o{n 



holds uniformly over a 2 £ S(a, R,g^) for any a, R,a 2 >0 since the functional 
\J (t{*) i—t- f Q a p (t)dt is smooth on L 2 . Their LAN-result shows that asymp- 
totic normality with rate n -1 / 4 and variance is mimmax 
optimal. Specializing to the case p = 2 for integrated volatility, the asymp- 
totic variance is 85 f^a 3 ^ dt. It should be stressed here that the existing 
estimation procedures for integrated volatility are globally suboptimal for 
our idealized model in the sense that their asymptotic variances involve the 
integrated quarticity J* <7 4 (£) dt which can at most yield optimal variance for 

constant values of a 2 , because otherwise J Q a A (t) dt > (f Q a 3 {t) dt) 4/3 follows 
from Jensen's inequality. The fundamental reason is that all these estimators 
are based on quadratic forms of the increments depending on global tuning 
parameters, whereas optimizing weights locally permits to attain the above 
efficiency bound as we shall see. 

Instead of following these more abstract approaches, we use our analysis, 
which is fundamentally a local likelihood approach, to construct a simple es- 
timator of the integrated volatility with optimal asymptotic variance. First, 
we use the statistics (yjk) in £2 an d then transfer the results to £0 using 
(y%) from (8.2). 

On each block k, we dispose in £2 of independent N(0, h 2 j~ 2 ir~ 2 a 2 (kh) + 
e 2 )-observations yjk for j > 1. A maximum-likelihood estimator a 2 (kh) in 
this exponential family satisfies the estimating equation 

(8.5) a 2 (kh) = ^2w, k (a 2 )h- 2 j 2 7T 2 (y 2 k -e 2 ), 

J ~\ , 9, (a 2 (kh) + h^\ 2 j 2 y 2 
(8 6) where w ik (a 2 ) := — - — - — - Q ^ . 

This can be solved numerically, yet it is a nonconvex problem (personal 
communication by J. Schmidt-Hieber). Classical MLE-theory, however, as- 
serts for fixed h, k and consistent initial estimator a^(kh) that only one 
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Newton step suffices to ensure asymptotic efficiency. Because of h — > this 
immediate argument does not apply here, but still gives rise to the estimator 

We:= £ hY,Wjk(al)h- 2 j 2 ir 2 (y 2 jk -e 2 ) 

fc=0 ]>X 

of the integrated volatility IV := Jq a 2 (t) dt. Assuming the L°° -consistency 
\\a 2 — <7 2 ||oo — > in probability for the initial estimator, we assert in £2 the 
efficiency result 

e~ 1/2 (Tv £ - IV) -^n(o,8^ a 3 (t)dty 
To prove this, it suffices by Slutsky's lemma to show 

(8.7) e^ 2h ^\Y J ^ 2 )h- 2 J 2 ^ 2 {y 2 k -e 2 )^NUs f a>{t)dt), 

l — n „'^l \ JO / 



k=0 j>l 



sup \w jk (al) -w jk (a 2 )\ < w jk {a 2 )\\a 2 n - a 2 \ 



Ok 



The second assertion (8.8) follows from inserting the Lipschitz property 
that W(x) := (x + hQ 2 7T 2 j 2 )- 2 satisfies \W'{x)\ <W(x), and thus \W(x) - 
W(y)\ < W(a:)|a; — y\ uniformly over x, y > a 2 > 0. 

For the first assertion (8.7), note that in £2 the estimator IV e is unbiased 
and 

vJj2 Wjk (a 2 )h- 2 j\H y % -e 2 )) = (2 * h -2 2 . 2 ^ 2 - 

Vj^ / Ej>i( CT ( kh ) + K 71 j ) 2 

We now use the identity, derived as (5.2), 

v A3 _ l + 4Ae^-e-^ 1 
{ ' ^(A 2 + vr 2 j 2 ) 2 4(l-e" 2A ) 2 2A 

and obtain by Riemann sum approximation as ho — > 00 (with arbitrary speed) 

£ 1 Var(/F £ ) = ^ 2 '""V 2 2 -2,-2 ~> 8 f^C*)*- 

Due to the independence and Gaussianity of the (y,fe), we deduce also 



E 



^M- 2 )h- 2 J 2 Ay 2 k -K[y 2 k }) 



S>1 ' 
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such that the central limit theorem under a Lyapounov condition with power 
p = 4 [e.g., Shiryaev (1995)] proves assertion (8.7), assuming h — > and ho — > 
oo. A feasible estimator is obtained by neglecting frequencies larger than 
some J = J{e): 

(8.10) fV £ ,j := hj^w^l)h- 2 j^\y% ~ e 2 ) 

k= ° ! =1 j , 9, ((r 2 (tt)+L"Vjt 2 

(8.11) where wl(a 2 ) := \ V — '—^ . 

A simple calculation yields Efl/V^j — -/V e | 2 ] < e(ho/J) 3 such that for 
ho /J — > convergence in probability implies again by Slutsky's lemma 

e-V 2 (IV e ,j -IV)-^N (o, 8 J a\t) dtj . 

By the above argument, weak convergence results transfer from £2 1° 
£0 and we obtain the following result where we give a concrete choice of 
the initial estimator, the block size h and the spectral cut-off J [we just 
need some consistent estimator a 2 , h^n 1 / 2 — > as well as hn 1 / 2 — > 00 and 
J- 1 =o(h- 1 n- 1 / 2 )]. 

Theorem 8.1. Let y® k for j > 1, k = Q, ... ,h~ x — I be the statistics (8.2) 

from model £$. For h ~ n -1 / 2 log(ra) and J / log(n) — > oo consider the estima- 
tor of integrated volatility 

k=0 j=l 

with weights wj k from (8.11) and the initial estimator a 2 from (8.3). Then 
IV n is asymptotically efficient in the sense that 

n 1/4 (Tv n - IV) ^N^0,86 a 3 {t)dt^ asn^oo, 

provided a 2 is strictly positive and a -Holder continuous with a > 1/2. 

A straight-forward implementation of IV n shows a finite sample behav- 
ior as predicted by the asymptotic results. We present some simulation re- 
sults for a situation with simplified, but realistic model parameters. The 
sample size n = 30,000 corresponds to roughly one observation per sec- 
ond and the noise level is set to 5 = 0.01. The spot volatility curve a(t) = 
0.02 + 0.2(t — 1/2) 4 is bowl-shaped, reflecting the empirical evidence of high 
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volatility at opening and closing. In Figure 1 (left) the spot volatility and its 
estimate a on 30 blocks are presented. Instead of (8.1), we use a local-linear 
estimator to catch the boundary values slightly better. Also for the inte- 
grated volatility estimator we use h~ l =30 blocks (h ~ Qy/n, or expressed 
in real-time about 12- minute intervals), but the estimator is quite robust 
to this choice. Theoretically the maximal frequency J can be as large as 
possible, but due to discretization there is no more information in higher 
frequencies than the block sample size. With a look at the error analysis, 
we use J := min(2a h/(ir 5), nh) with a denoting some upper bound on the 
volatility, which in our case evaluates to J = 43. 

In Figure 1 (right), we show the integrated volatility estimation results 
obtained from 10,000 Monte Carlo iterations. The horizontal line gives the 
true value IV = 0.0023. The first box plot presents the result using the 
weights with estimated spot volatility, while the results with optimal oracle 
weights are shown in the second box plot. We see that the estimators are 
practically unbiased and do not suffer from many outliers. The empirical 
root mean squared error with estimated weights is by only 5.0% larger than 
the asymptotic approximation (8-7= J a 3 (t) dt) 1 ^ 2 . With oracle weights, this 
reduces to 4.1%. An optimal procedure with global tuning achieves asymp- 
totically (8-^7= (J* <J 4 {t) (ft) 3 / 4 ) 1 / 2 , which in our case is 19% larger. Our experi- 
ence with the well-established multiscale estimator confirms this size, when 
oracle weights are used. Yet, it seems that the performance of the multiscale 
estimator suffers significantly from estimated weights. 

Also stochastic volatility models are recovered quite well by our imple- 
mentation. The simple quadratic form of the estimator IV n suggests that 
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in this case a stable central limit theorem can be derived by the usual meth- 
ods. Note, however, that the analysis cannot simply rely on our asymptotic 
equivalence result since £q becomes non-Gaussian and, even more, Le Cam 
theory for stochastic parameters (like a 2 ) need to be developed. In the spirit 
of Mykland (2010), we content ourselves with the theoretical results which 
elucidate the underlying fundamental structures for the basic model and 
allow straight-forward extensions to more complex models. 



APPENDIX 

A.l. Gaussian measures, Hellinger distance and Hilbert Schmidt norm. 

We gather basic facts about cylindrical Gaussian measures, the Hellinger 
distance and their interplay. 

Formally, we realize the white noise experiments, as L 2 -indexed Gaussian 
variables, for example, in experiment £\ we observe for any / € L 2 ([0, 1]) 

Y f := (/, dY) := J fit) (J* a(s) dB(s)\ dt + e J f(t) dW t . 

Canonically, we thus define F a > e on the set = M Z/2 ([°' 1 ]) with product Borel 
a-algebra T = *B® L d ' 1 !) (realizing a cylindrical centered Gaussian mea- 
sure). Its covariance structure is given by 

E[Y f Y g ] = (Cf,g), /, 5 GL 2 ([0,1]), 

with the covariance operator C : L 2 ([0, 1]) — > £ 2 ([0, 1]) given by 

pi / ptAu \ 

Cf(t)= / a 2 (s)ds)f(u)du + e 2 f(t), /GL 2 ([0,1]). 



J / 

Note that C is not trace class and thus does not define a Gaussian measure 
on L 2 ([0,1]) itself. 

In the construction, it suffices to prescribe (Y em ) m >i for an orthonormal 
basis (e m ) m >i and to set 

oo 

y /: =]T</,e m >y em . 

m=l 

This way, we can define ¥ a,£ equivalently on the sequence space = M N 
with product u-algebra T = 53® N . This is useful when extending results 
from finite dimensions. 

The Hellinger distance between two probability measures P and Q on 
(f2, F) is defined as 
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where \x denotes a dominating measure, for example, [i = P + Q, and p and q 
denote the respective densities. The total variation distance is smaller than 
the Hellinger distance: 

(A.i) ||P-Q||tv<#(P,Q). 

The identity if 2 (P,Q) = 2 — 2 J y/p<Jqdn implies the bound for finite or 
countably infinite product measures 

(A.2) W(g)Pn,(g)Qn) <^# 2 (Pn,Qn). 

Moreover, the Hellinger distance is invariant under bi-measurable bijections 
T : ft — > Q' since with the densities poT^ 1 , g o T~ 1 of the image measures P T 
and Q T with respect to [i T we have 

H 2 (F T ,Q T ) = [ {y/poT-i - y/qoT- 1 ) 2 
(A.3) J J 

Jn 

For the one-dimensional Gaussian laws iV(0, 1) and N(0,a 2 ), we derive 
H 2 (N(0, l),N(0,a 2 )) = 2 - y/8a/(a 2 + 1) < 2{a 2 - l) 2 . 

For the multi-dimensional Gaussian laws N(0, Si) and iV(0, E 2 ) with inverti- 
ble covariance matrices £i,£ 2 G R dxci , we obtain by linear transformation 

— 1/2 — 1/2 

and independence, denoting by Ai, Arf the eigenvalues of S x £2^1 : 
H 2 (N(0, Ei),iV(0, S 2 )) = F 2 (iV(0, Id), N(0, £- 1/2 £ 2 £- 1/2 )) 

<]T 2 (A fc -l) 2 . 

fc=i 

The last sum is nothing, but the squared Hubert-Schmidt (or Frobenius norm) 
of S 1 1 ^ 2 Tj2^'i — Id such that 

(A.4) F 2 (iV(0,S 1 ),iV(0,S 2 )) < 2||S7 1/2 (S 2 - Ei)E^ 1 ^ 2 || 2 iS . 

Observing that (A.2) and (A.3) also apply to Gaussian measures on the 
sequence space M N , the bound (A.4) is also valid for (cylindrical) Gaussian 
measures iV(0, £j) with self-adjoint positive definite covariance operators 
S,:L 2 ([0,1])^L 2 ([0,1]). 

The Hilbert-Schmidt norm of a linear operator A: H —> H on any sepa- 
rable real Hilbert space H can be expressed by its action on an orthonormal 
basis (e m ) via 

PIIhs = S ^ j {Ae m ,e n ) 2 , 
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which for a matrix is just the usual Frobenius norm. For self-adjoint opera- 
tors A,B with K^u, u}| < \ (Bv,v)\ for all v G H, we use the eigenbasis (e m ) 
of A and obtain 



(A.5) P||| s = J2( Ae m,e m ) 2 < ^(5e m , e n ) 2 = \\B\ 



2 

HS- 



m,n 



Furthermore, it is straight-forward to see for any bounded operator T 

(A.6) ||im||hs<||?1|P||hs, ||^t||hs<||t||MIIhs 

with the usual operator norm ||T|| of T. Finally, for integral operators 
Kf(x) = J Q k(x,y)f(y)dy on L 2 ([0,1]) it is well known that 

(A- 7) \\K ||hs = ||^||l2([ 0i i]2). 

For two Gaussian laws with different mean vectors Hi,H2 and with the 
same invertible covariance matrix E, we can similarly use the transformation 
S-V2 and the scalar case H 2 (N(mi, 1), Af(m 2 , 1)) = 2(1 - e -(™i-™2) 2 /8) < 
(mi — 7712) 2 /4 to conclude by independence 

(A.8) F 2 (iY( m ,S),iV^ 2 ,S)) < illS- 1 / 2 ^! - M2 )|| 2 . 

Combining (A. 4) and (A.8), we obtain by the triangle inequality the bound 

^(AT^E^jVO^Ea)) <4||E7 1/2 (/x 1 - M2 )|| 2 



(A.9) 

IHS 



+ i||S7 1/2 (S 2 -E 1 )E7 1/2 » 2 - 



A.2. Proof of Theorem 2.2. We first show that £\ is asymptotically at 
least as informative as £q for e = 5/y/n and a > 0. From £1 with e = 6/y/n, 
we can generate the observations (statistics) 

/•(2i+l)/2n /•(2i+l)/2n 

Yi\=n I dY t = n X t dt + ii, i = 1, . . . , ra - 1, 

J(2i-l)/2n J(2i-l)/2n 

Y n := 2n [ dY t = 2n f X t dt + e n , 

J(2n-l)/2n J(2n-l)/2n 

with Si = ne(W(2 i+ i)/2n ~ W(2t-i)/2r0 ~ N(0, 8 2 ) and similarly e n ~ AT(0, <5 2 ), 
all independent. In contrast to standard equivalence proofs, it turns out to 
be essential here to take Yi as a mean symmetric around the point i/n. Since 
(Yi) and (Yi) are defined on the same sample space, using inequality (A.l) 
it suffices to prove that the Hellinger distance between the law of (Yi) and 
the law of (Yi) tends to zero as n tends to infinity. 

For the integrated volatility function, we introduce the notation 



a(t) := I a 2 (s)ds, 0< t < 1. 
Jo 
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For notational convenience, we also set a(l + s) := a(l — s) for s > 0. 

The covariance matrix E y of the centered Gaussian vector (Yi) is given by 

:= E[Y fc Y,] = a(fc/n) + <5 2 1(£; = 1), l<k<l<n. 

Similarly, the covariance matrix £ y of the centered Gaussian vector (Yi) is 
given by 

/•(2fc+l)/2n 

:= EfYfeYJ] = n / a(t) dt + <5 2 l(fc = /), 1 < fc < Z < n, 

J(2k-l)/2n 

where for k = I = n we used the convention for a(l + s) above. We bound the 
Hellinger distance using consecutively (A. 4), £ y > 5 2 ld in (A. 5) and (A. 2), 
a Taylor expansion for a and treating the case k = I = n by a Lipschitz bound 
separately: 

H 2 (£(Y i ,i = l,...,n),£(Y i ,i = l,...,n)) 



<2||(S y )- 1 / 2 (S y -S^)(S y )- 1 / 2 H 2 - 



HS 



<2<T 4 ||£ y -£ y || 2 ls 



/ W2fc+l)/2n \ 

<4(T 4 V In (a(t)-a(k/n))dt) 
l<k<i<»A J(2k-i)/2n J 

<4<T 4 ( 0(R 2 n~ 2 ) 



n / /-(2fc+l)/2n ^ 2 

+ n V I n / (a'(k/n)(t - k/n) + O^n" 1 - )) di 



l-Q 

l^U {K/ : U){i — K/'lb) -r KjyiXIL 

)/2n 

^1 nil -dZ-2\ , o/' o2„2-2-2a 



= 4<5- 4 (0(^n" / ) + 0(RV~ M )) 
= 0(S~ i R 2 n~ 2a ). 

Consequently, by (A.l) the total-variation and thus also the Le Cam dis- 
tance between the experiments of observing (Yi) and of observing (Y}) tends 
to zero for n — > oo, which proves that the white noise experiment E\ is 
asymptotically at least as informative as the regression experiment £q. 

To show the converse, we build from the regression experiment £q a con- 
tinuous time observation by linear interpolation. To this end, we intro- 
duce the linear I?-splines (or hat functions) 6j(t) = b(t — i/n) with b(t) = 
min(l + nt, 1 — £ra)l[-i/n,i/n] (*) an d set 

n n n 

f; :=Y^YMt) =^X i/n bi(t) + ^ £i b l (t), t € [0, 1]. 

i=l i=l i=l 
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Note that (Y/) is a centered Gaussian process with covariance function 

n n 

c(t,s) :=E[Y/Y S '] = <(i A jynMQbjis) + <5 2 

i,j=l i=l 

0<t,S< 1. 

For any / G L 2 ([0, 1]), we thus obtain 



i,j=l i=l 



2 



because J" n&j = 1 yields by Jensen's inequality (/, n6j) 2 < (/ 2 ,n6j) and we 
have £V 6/ < 1 . This means that the covariance operator C induced by the 
kernel c is smaller than 

n 

Cf(t) := £ a((iAi)/n)(/,6 j )6i(t)+<5 2 n- 1 /(t), /GL 2 ([0,1]), 

in the sense that C — C is positive (semi-)definite. Now observe that C is 
the covariance operator of the white noise observations 

n j- 

(A.10) dY t = Y,X i/n b i (t)dt + —dW t , te[o,i]. 

i=l v n 

Hence, we can generate these observations from (Y/) by randomization, that 
is, by adding independent, uninformative N(0,C — C)-noise to Y' . Now it 
is easy to see that observing Y in (A. 10) and Y from E\ is asymptotically 
equivalent, since in terms of the respective covariance operators, using again 
(A. 4), (A. 5) and (A. 2), the squared Hellinger distance satisfies 



H 2 (C(Y),C(Y)) 

<2\\{c Y y 1 ' 2 {c -c Y ){c Y y 1 ' 



2 1 1 2 



IHS 



<2<T 4 n 2 f [ (a(tAs)-S2a((iAj)/n)b i (t)b j (s)) dtds 
Jo Jo V i,j=l J 



■j = 

2<T 4 n 2 / j ( J2(a(tAs)-a({iAj)/n))b i (t)b j (s) j dtds, 

\i,j=0 J 



1 rl I « 
Jo 
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where for the last line we have used X^=oM*) = 1 an ^ a (0) = 0- Since 
bi(t) 7^ can only hold when i — \nt\ G {0, 1}, the a-H61der regularity of a 2 
implies for t < s — 1/n: 

( («(* A s ) - «((* A j)/n))&i(*)M s ) ) 

\i,j=0 J 
\k,l=0 



x & fc+Nj(*)^+M( s )J 

= 0{R 2 n~ 2 ~ 2a ) + (a'flntj/n) J> - (k + LntJ )/n)6 fc+LntJ (t)) 
V fc=o / 

A symmetric argument gives the same bound for s <t — 1/n. For \t — s\ < 
1/n, we use only the Lipschitz continuity of a to obtain the bound 0(R 2 n~ 2 ). 
Altogether, we have found 

H 2 (C{Y)X{y)) <^n 2 {0{R 2 n- 2 - 2a ) + n- l O(R 2 n- 2 )) = 0(5- A R 2 n- 2a ), 

which together with the transformation in the other direction shows that 
the Le Cam distance between Sq and E\ is of order 0(8~ 2 Rn~ a ). 

A. 3. Proof of Proposition 3.2. The main tool is Proposition A.l below. 
Together with the Holder bound 

\a 2 ([s\ h )-a 2 (s)\<Rh a , sG [0,1], 

it implies that for fixed a the observation laws in E\ and £2 have a Hellinger 
distance of order Rh a a^ 3 ^ 2 e~ 1 ^ 2 . By inequality (A.l), this translates to the 
total variation and thus to the Le Cam distance. 

Proposition A.l. For e > and continuous a: [0, 1] — > (0,oo) consider 
the law P CT,£ generated by 

dY t =(j a(s)dB(s) \ dt + edW u t e [0, 1], 

with independent Brownian motions B and W . Then the Hellinger distance 
between two laws F^ 1 ' 6 and W a2,£ satisfies 
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Proof. The covariance operator C a of P°" ,e is for f,g€ L 2 ([0, 1]) with 
antiderivatives F, G satisfying F(l) = G(l) = given by 

(C a f, g) = E[(f, dY) (g, dY)} = E[(f, X) (g, X)]+e 2 (f, g) 



J FGa 2 + s 2 J fg. 



For covariance operators corresponding to ci, CJ2> we have by twofold partial 
integration 



\{(C ai -C a2 )fJ)\ 



Jo 

1 



Ms 



al){u) duf{t)f{s)dsdt 







<\Wi 



F{uY{a{-ai){u)du 
■ 1 



a. 



2 Woo 



F{uydu 







= Ikl ~ <72||oo(Cbm/, /) 

with C-Qyig{t) '■= Jq {t A s)g(s) ds, the covariance operator of standard Brow- 
nian motion. Using further the ordering C ai > min^ cr 2 (t)CBM + £ 2 Id and 
(A. 5), (A. 2), we obtain 

\\ C ai /2 (C a2 - C ai )C~^ /2 \\ns 



< hi 

< Ik? 



^2 



2|| \\ r t-l/2 r t ^-1/2 11 



'2 lloo 



(mino- 2 {t)C B M + £ 2 Id 



-1/2 



C B m ( min a 2 (*)Cbm + £ 2 Id 



-1/2 



11s 



r 2l|oo||-H"(C , BM)||HS) 

employing functional calculus with H(x) = (mint cr 2 (t)x + e 2 )~ 1 x. The spec- 
tral properties of Cbm imply that H(Cbm) has eigenfunctions efc(i) = 
v / 2sin(-7r(A;— l/2)t), A; > 1, with eigenvalues \ k 



4min t cr^ (t)+(2k-l) 2 n 2 e 2 ' 
3/2/ 



when- 



ce its Hilbert-Schmidt norm is ||(Afc)||#2 ~maxta 1 (t)e ' [use ^2 k {s + 



~e is" 3 ! 



This yields the result. □ 



A. 4. Proof of Proposition 5.2. We only consider the case of odd indices 
k, both cases are treated analogously. Grama and Nussbaum (2002) establish 
in their Theorem 6.1 in conjunction with their Theorem 5.2 that £3"^ an d 
the Gaussian regression experiment Qj^m of observing 

(A.ll) Y k = v e s 2 (kh) + I(a 2 (kh))- 1 / 2 7k , k£ A m odd, 7/fc ~iV(0,l) i.Ld., 
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are equivalent to experiments £ 3 , m = (y,G, (^)s 2 &c a (R)) an d Gz,m = (y,G, 
(Q^)s 2 6C Q (i?))) respectively, on the same space (y,G) such that 

(A. 12) sup H 2 (F™ 2 ,Q™ 2 )<r 2p 

s 2 eC a (R) 

holds for all p < 1 . 

To be precise, it must be checked that the regularity conditions (Rl)- 
(R3) of Grama and Nussbaum (2002) are satisfied for all values 5. One 
complication is that in our parametric model the laws P$ and the Fisher 
information I(t?) depend on ho which tends to infinity. Yet, inspecting the 
proofs it becomes clear that the results remain valid if the score I = lh 

— 1/2 — 1 

is multiplied by h Q and the Fisher information accordingly by h and 
the localization is such that the parametric rate i~ 1 / 2 (in our block length 
notation) is attained, which is ensured by our choice in (5.3). Since ~ ho 
is a consequence of (5.2), it remains to check conditions (Rl), (R2) of Grama 
and Nussbaum (2002) adjusted to our setting. Our score is differentiable such 
that with Yj ~ AT(0, g0) = t? + fc^Vj 2 

1 ^ y] - 9j (0) ? , „ _ A 1 v 2y[" 9j (#) 



By the mean value theorem, (Rl) requires E^[(Z(i?) + |Z(i?) 2 ) 2 ] < ho (ex- 
pressed in the score). This follows here by direct moment evaluation us- 

dx 

(tf + TT 2 X 2 )P 



in S T,j>i9j(&) p ~ h J °° fgqjm ~ for p > 1/2. For (R2), we have to 



bound the 25-moment of l(v)^JdF v /dF$ for v in a neighborhood of By 
the Cauchy-Schwarz inequality and the preceding arguments for I, it suf- 
fices to bound the moments of W dF v /dF^, which are finite up to the order 
maxj |1 — gj((d) 2 / gj{v ) 2 | _1 . For v — > this tends to infinity and (R2) can be 
satisfied for any 5 > 0. Uniform bounds are always ensured over parameters 
i? bounded away from zero and infinity. 

In view of the independence among the experiments (£| ^) m and equally 
among the experiments (5a m)mj we infer from (A. 12) and (A. 2) 

/(eh)- 1 (ih)~ l \ 
sup H 2 [ (g) P£, (g) )<(£h)- 1 r 2 P<£- 1 v^ p v^. 



s 2 &C a (R) 



m=l m=l 



Since we assume ho = o(e^ 2a )/ 2a ^ the right-hand side tends to zero provided 

a p(l — 2a) 4pa p — a 

-1 + 2 h — H — = —r r > 

2a + 1 a 2a + 1 a(2a + 1) 

holds. Since p < 1 is arbitrary, this is always satisfied for a < 1. In the case 
a = 1, we use ho < e~ p for some p < 1/2. We have derived asymptotic equiv- 
alence between the product experiments ®m^3°m an d ® m G?>,rn- A fortiori, 
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applying the Brown and Low (1996) result, this leads to asymptotic equiv- 
alence between observing (yjk) in experiments £2,100 an d the corresponding 
Gaussian shift models of observing 

(A.13) dY t = I(al{t)) l / 2 v £ s 2 {t)dt + {2h) l l 2 dW u * e [0, 1]. 

From the explicit form (5.2) of the Fisher information, we infer for Hq — > 00 

2^3/2 



he 



-1(0) 



1 1 

4 + 2ti 1 / 2 h 



< 



-aho 



Consequently, by the polynomial growth of ho in e _1 , the Kullback-Leibler 
divergence between the observation laws from (A.13) and the model ^3.i oc 
converges to zero. This gives the result. 

A. 5. Proof of Proposition 6.1. Since the observations yj k for j > 1 are 

the same in y and y, we can work conditionally on those. Moreover, it 
suffices to consider only the event Q £ := — 0" 2 ||oo < Rv £ } because the 
squared Hellinger distance satisfies by conditioning and restriction to f2 e 
(with density functions / and further obvious notation) 



H (C(y),C(y)) - J fy\(y jk )^ h J(y jk )^ hk - y 7y|( %fc ) J -> lifc /( Wfc )j>i,ik)' 
= E[H 2 (C((y ok ) k \(y jk ) j > lik ),C({yok)k\(yjk)j>i,k))] 

< E[i? 2 (£((yofc)fc|(yifc)i>i,fc),^((yofc)fc|(yjA : )j>i,fe))inJ 

+ 2P(fiJ) 

with P(fig) — > 0. Conditional on (yjh)j>i,ki both laws are Gaussian, (j/q 
has mean [i with 

Var^-jt 



k)k 



^ Var(y ifc 



-2/j0, 



J>1 V 



fVav^j^x) j+1 Vax(Pj ik -i] 



Var(y 



■(-i) J+1 y i>fc -i + 



Vax(y jt 



k-l) 



yjk 



for A; > 1 and covariance matrix £ with 



.7>1 V 



/Var(/3 iifc _i) , Var(^ fc ) 



+ 



Var(y J)fc _i) Var(y ifc 



cfcAfc'£ 2 y^(-i, 

L0, 



7 - +1 e 2 VarQ3j |fc _i) _ £^ 
Var(y jifc _i) 2 ' 



if k' = k, 

if # = Aril, 
otherwise, 
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where c k '■= 1 V (2 — k) 6 {1,2}. Conditional mean /I and covariance matrix 
S of (yofc)fc have the same representation, but replacing Var each time by 
Var e , compare (6.3). 



Prom var(y J fcj = (1 + 2 j 2 a 2 (kh)) 1 , we infer for h$ — > oo by Riemann 
sum approximation 

./ Var(^-, fc -i) | Var(/3 jfc ) 
^ V Var (%\fc-i) Var(y jfc ) 



E 



+ i 2 V 



oo, 



E 

3>1 



Var(/3,- fc 



Var(y 



j>3 



1. 



-(l + (2j) 2 ^ 2 )(l + (2i + l) 2 ^ 

Hence, E is a matrix with entries of order e 2 ho on the main diagonal and 
entries of order e 2 on the two adjacent diagonals. A simple Cauchy-Schwarz 
argument therefore shows (T,v,v) > {e 2 ho — £ 2 )||i>|| 2 ~ e 2 /io|M| 2 for ho — > oo 
which implies E > ehld in matrix order. Combining this with the Hellinger 
bound (A. 9), we arrive at the estimate 

^H 2 (C((yo k ) k \(y jk )j>i,k)X((yok)k\{yjk)j>i,k))} 



< E 



eh 



+ 



I s ~ s IIhs 
e 2 h 2 



< 



7>l,fc V 



(Yai(f3 jk ) V&T £ ((3 jk )\ 2 Vai{y jk ) 



+ E 

3>hk 



\&r(y jk ) V&r e (y jk )J eh 
e 2 Var(/3 jfc ) e 2 Var £ ((3 jk ^ 



Var(y jk ) Var e (y j 



/A- J 



7 has derivative G'(z) 

\\&j k \\ z+e* V ' 



ll<M|V 



l$jfcll 2 e 2 |i-»- z l 



The function G(z) 
thus satisfies uniformly over all z bounded away from zero \G(w) — G(z)\ < 

2 I < ?;,. and llO.-i.ll ~ h/j, we thus find the 

v 2 mm(h /j,j/h ) 4 . 



yr. Inserting \<j —a, 



uniform bound on f2 £ 

{Vax(Pj k ) Vax e (P jk ] 



< 



t> £ and \\$jk\ 
v 2 e e^lf 



Vax(y jk ) Var £ (y jk )J ~{e 2 + h?/j 2 ) A 
Putting the estimates together, we arrive at 

H 2 (C(y),C(y))<v 2 £ Y, ^m(ho/j,j/h o y 
3>hk 



1 + hlH 
h 



1 

+ V 2 

"o 



< 2 V 2 / l - 1 ^min(/ lo /i,j7/ i o) 2 /i 1 + F (^) 



v 2 e h^e- 1 
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such that the Hellinger distance tends to zero uniformly if Hq = o(e), 
which is ensured by our choice of ho. This implies asymptotic equivalence of 
observing y and y and thus of experiment £2 and of just observing (lljk)j>i,k 
in £2. By independence, the latter is equivalent to <?2,odd ® "^even- 
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